Product Code Database

http://www.upcscavenger.com/wiki/Speech%20corpus

upcScavenger » Corpora » Wiki: Speech Corpus

Speech corpus

( Corpora )

Corpora Corpus Speech Spoken

Rank: 100%

Wiki
Comments
Media

A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other things, to create (which can then be used with a speech recognition or speaker identification engine). In linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.

A corpus is one such database. Corpora is the plural of corpus (i.e. it is many such databases).

There are two types of speech corpora:

Read Speech, which includes:

Book excerpts
Broadcast news
Lists of words
Sequences of numbers

Spontaneous Speech, which includes:

Dialogs – between two or more people (includes meetings; one such corpus is the KEC);
Narratives – a person telling a story (one such corpus is the Buckeye Corpus);
Map-tasks – one person explains a route on a map to another;
Appointment-tasks – two people try to find a common meeting time based on individual schedules.

A special kind of speech corpora are non-native speech databases that contain speech with a foreign accent.

Arabic Speech Corpus
Common Voice
EXMARaLDA
Lingua Libre, an online Free software tool
List of children's speech corpora
Non-native speech database
Praat
Spoken English Corpus
The BABEL Speech Corpus
TIMIT
Transcriber
Transcription (linguistics)

Edwards, Jane / Lampert, Martin (eds.) (1992): Talking Data – Transcription and Coding in Discourse Research. Hillsdale: Erlbaum.
Leech, Geoffrey / Myers, Greg / Thomas, Jenny (eds.) (1995): Spoken English on Computer: Transcription, Markup and Application. Harlow: Longman.

Santa Barbara Corpus of Spoken American English
Buckeye Corpus The Buckeye Corpus of Conversational Speech
The KEC -- The Karl Eberhards Corpus of spontaneously spoken southern German in dialogues - audio and articulatory recordings
Spoken Language Corpora at the Research Center on Multilingualism
The Spoken Turkish Corpus at METU Ankara
Spoken Corpus Klient with the Corp-Oral Corpus at ILTEC Lisbon
VoxForge – open source speech corpora
OLAC: Open Language Archives Community
BAS Bavarian Archive for Speech Signals
Simmortel Speech Recognition Corpus for Indian English and Hindi
ELRA: the European Language Resources Association
The PELCRA Conversational Corpus of Polish
The Arabic Speech Corpus
Corpus of Political Speeches : Free access to political speeches by American and Chinese politicians, developed by Hong Kong Baptist University Library
Large Multimodal Corpus of Human Speech

Categories: Corpora, Corpus Linguistics, Speech Recognition, Dialectology, Phonetics, Language Documentation

Page 1 of 1

1

Page 1 of 1

1

Account

Social: Privacy Policy

Pages: Scavenge .. QRCode .. Tags

Items: Shopping Cart .. Favorites

UPC Scavenger Android App

Navigation

General:

Atom Feed .. Entire Sitemap

Help: Index .. Editing .. Full List

Category: All .. Products .. Vendors

Media: Product .. Wiki .. User

Posts: Product .. Wiki .. User .. Forum

Statistics

Page: Revisions .. Tag Cloud

Summary: Database .. Activity

1 Tags

10/10 Page Rank

5 Page Refs